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ABSTRACT 

Large-scale structures in the Universe are a powerful tool to test cosmological models and constrain 
cosmological parameters. A particular feature of interest comes from Baryon Acoustic Oscillations 
(BAOs), which are sound waves traveling in the hot plasma of the early Universe that stopped at 
the recombination time. This feature can be observed as a localized bump in the correlation function 
at the scale of the sound horizon r 8 . As such, it provides a standard ruler and a lot of constraining 
power in the correlation function analysis of galaxy surveys. Moreover the detection of BAOs at the 
expected scale gives a strong support to cosmological models. Both of these studies (BAO detection 

and parameter constraints) rely on a statistical modeling of the measured correlation function £. 
Usually £ is assumed to be gaussian, with a mean depending on the cosmological model and a 
covariance matrix C generally approximated as a constant (i.e. independent of the model). In this 
article we study whether a realistic model-dependent Cg changes the results of cosmological parameter 
constraints compared to the approximation of a constant covariance matrix C . For this purpose, we 
use a new procedure to generate lognormal realizations of the Luminous Red Galaxies sample of the 
Sloan Digital Sky Survey Data Release 7 to obtain a mo del- dependent Cg in a reasonable time. The 
approximation of Cg as a constant creates small changes in the cosmological parameter constraints 
on our sample. We quantify this modeling error using a lot of simulations and find that it only has 
a marginal influence on cosmological parameter constraints for current and next-generation galaxy 
surveys. It can be approximately taken into account by extending the la intervals by a factor rs 1.3. 
Subject headings: large-scale structure of Universe - distance scale - dark energy - cosmological pa- 
rameters 



1. INTRODUCTION 

One of the most important question in modern cos- 
mology is to understand the nature of dark energy. This 
mysterious form of energy is responsible for the accel- 
erate expansion of the Universe, and seems to account 
for more than 70% of the energ y content of the U niverse 
Csee e.g. iKomatsu et all (|2009f ): lAmanullah et ahl (|2010f ): 
IBlake et al.1 (|2011bft ). 

The acceleration of the expansion of the U niverse was 
first measured with high-redsh ift supernovae ()Riess et al.l 
119981: iPerlmutter et all 119991) . The principle is to use 
Type la supernovae as standard candles in order to probe 
the redshift-distance relation. The same principle has 
been used more recently in the study of galaxy cluster- 
ing at l ow redshift using Baryo n Acoustic Oscillations 
(BAOs. iBassett fc Hlozekl (|2010)). These structures are 
remnants of acoustic waves which travelled in the plasma 
before recombination, when baryons and photons were 
coupled together. Their absolute size is given by the 
sound horizon scale at the baryon drag epoch, and is well 
constrained by measurements of the Co smic Microwave 
Back ground (CMB), r s = 153.3 ± 2 Mpc (jKomatsu et al.l 
2009). Thus they' can be used as a standard ruler to 
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probe the redshift-distance relation. 

BAOs are a very promising cosmological probe because 
they are less affected by systematics than other meth- 
ods (jAlbrecht et al.1 12009 ). They can also be very use- 
ful to cross-check result s from other probes . This has 
been done for example in IBlake et al.l (|2011bD , where the 
combination of the WiggleZ, Sloan Digital Sky Survey 
(SDSS) and 6-degree Field (6dF) surveys have been used 
to cross-check supernovae results. As future experiments 
will provide more precise information, it will be critical 
to correctly analyze and combine these different probes. 
In particular one might face new challenges to deal with 
systematic effects that were under statistical uncertainty 
in previous experiments and that become important. 

Possible systematics can come from incorrect statisti- 
cal modeling of the data. For example in the case of 
BAOs in large-scale clustering, a classical procedure is 

to measure the correlation function £ and fit it to an 
expected correlation function £g with a dependence on 
cosmological parameters 9. More precisely, one assumes 
a statistical model for £ as a function of 9 in order to com- 
pute the likelihood Cg{€)- A common statistical model 
is to consider that £ is simply gaussian, centered on the 
expected correlation and with a constant covariance 
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matrix C (i.e. independent of 9). 

The Gaussianity h a s been sho wn to be well verified , 
e.g. in lLabatie etaLl (|2012b| ) and iManera et all (j2012t ). 
However the approximation of a constant covariance C 
has not been well studied, probably because it is very dif- 
ficult to estimate a mo del- dependent covariance matrix 
Co- Indeed the usual procedure to estimate a covariance 
matrix is to use a large number of realistic mock cata- 
logues and compute the empirical covariance matrix 

1 N 

fc=i 

1 w 

Having a good estimate of the covariance matrix re- 
quires a lot of simulations. This procedure can already 
be long for one value of 9, and it seems infeasible to apply 
it on a multi-dimensional grid of 9 values. 

As an alternative, one could find analytical formulae to 
estimate the covariance matrix of the co rrelation functio n 
£. A recent attempt has been made in IXu et al.l (2012). 
It starts from the analytic computation of the covariance 
matrix of £ for a Gaussian density field. The covariance 
matrix is further modified to better match the empirical 
covariance matrix on mock catalogues. It is shown to 
reproduce the empirical covariance matrix obtained with 
mock catalogues, while regularizing it. 

This is very interesting because it provides with little 
effort the covariance matrices for different input power 
spectra P{k) of the galaxy field, i.e. a model-dependent 
covariance matrix Co- However the procedure is not to- 
tally blind and requires an ad hoc fitting of the covariance 
matrix to mock catalogues for a given model. In par- 
ticular it has not been shown that the resulting model- 
dependent covariance matrix Co is also a good estimate 
for other models than the one used for the fitting. 

In this article we do not study this question of analyt- 
ically modeling the covariance matrix. Instead we study 
whether this modeling is actually required, i.e. if the 
model-dependence of Co affects the statistical analysis 
(e.g. by changing confidence regions). We will restrict 
to cosmological parameter constraints using the correla- 
tion function (we will not look at the question of BAO 
detection for reasons explained in section f3.2j) . 

For our analysis to be feasible, we will only consider 
3 parameters in 9 that have the most impact on the ex- 
pected correlation function £g. The first parameter is 
the matter density uj m — tt m h 2 which determines the 
horizon scale at the matter-radiation equality (oc w" 1 ). 
It also has a little influence on the sound horizon scale 
(oc w^ - 25 ^ ' 08 with = flbh 2 the baryon density) and 
changes the amplitude of the BAO peak (for a constant 
cob). The second parameter is a, that determines how 
the correlation function is dilated when using a fiducial 
cosmology instead of the true cosmology to convert red- 
shifts into distance. This parameter is the one that really 
probes the distance-redshift relation and it is mostly con- 
strained by the position of the BAO peak. Finally the 
third parameter is a constant bias B = b 2 in the correla- 
tion function that accounts for different amplitude effects 
(linear redshift distortions, linear galaxy bias, amplitude 



of matter fluctuation as). 

As we will estimate the covariance matrices using 
mock catalogues, a parameterization of Co with a 3- 
dimensional parameter 9 = (u> m , a, B) may already seem 
infeasible. However we will show how to optimize our 
simulations and the computation of the correlation func- 
tion in order to make it feasible. We will show that there 
is in fact only 1 parameter that needs to be varied, and 
that the 2 other parameters can be taken into account 
without adding much effort. 

The plan of this paper is as follows: we start in sec- 
tion [2] by describing the SDSS DR7-Full data catalogue 
that we use. In section [3] we discuss the correlation func- 
tion modeling and estimation. Section[4]presents our new 
procedure to estimate a model-dependent covariance ma- 
trix Co with a 3-dimensional parameter 9 = (u> m ,a,B) 
in a reasonable time. In section [5] we give results on the 
statistical modeling of the correlation estimator £: ab- 
sence of bias in £, Gaussianity of £, dependence of the 
covariance matrix Co on u m , a and B. Finally in section 
[5] we study the modeling error in parameter constraints 
due to the approximation of Co as a constant C. We 
study this modeling error on the SDSS DR7-Full £ and 
we perform a quantitative analysis using simulations. 

2. DATA CATALOGUE 

In this study we use the Luminous Red Galaxies sam- 
ple (LRG) sample of the last Data Release 7 (DR7) 
of the SDSS. LRGs ar e selected using the algorithm in 
lEisenstein et al.l (|2001| ) which consists in different lumi- 
nosity and color cuts using the five passbands u, g, r, i 
and z. These galaxies are very luminous and good trac- 
ers of massive dark matter haloes. The sample is quasi- 
volume-limited (i.e. nearly of constant density) up to 
redshift z ps 0.36 and extends up to z « 0.47 in a flux- 
limited way. In order to convert redshifts into distances 
we use a flat ACDM fiducial cosmology with ft m = 0.25. 
We plot the resulting density of the catalogue in figure 

m 

We use t he D R7-Full sample of the analysis in 
iKazin et al.1 ()2010l ) that is available onlin^H and has the 
characteristics given in table [T] 

TABLE 1 



# of LRGs 105,831 

Zmin 0.16 

Zmax 0.47 

(z) 0.324 

-^</,min -23.2 

^j,max -21.2 

(M g ) -21.72 

Area (deg 2 ) 7,908 

Volume (h" 3 Gpc 3 ) 1.58 



Density (1CT 5 h 3 Mpc~ 3 ) 6.70 



NO TES .—Chara cteristics of the SDSS LRG sample used DR7-Full 
from[Kazin et al. (2010). Volume and density have been computed 
with a flat ACDM fiducial cosmology with Q m = 0.25. 

The sample is mostly contiguous, with only 9.8% out- 
side of the main part of the Northern Galactic Cap. The 

1 http : //cosmo . nyu . edu/~ eak306/SDSS-LRG . html 
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number of LRGs is equal to 96763 in the Northern Galac- 
tic Cap and 9068 in the Southern Galactic Cap. We show 
the footprint of the survey in figure [2] with the Northern 
contiguous part and the few stripes in the Southern part 
(the blue line represents the Galactic plane). 
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Fig. 1. — Observed density of the sample DR7-Full when using a 
flat ACDM fiducial cosmology with Q m = 0.25 to convert redshifts 
into distances. 
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Fig. 2. — SDSS DR7-Full sample sky coverage in Aitoff projec- 
tion. The solid blue line represents the Galactic plane which sep- 
arates the Northern Contiguous region and the Southern region. 

3. CORRELATION FUNCTION MODELING AND 
ESTIMATION 

3.1. Correlation function modeling 

The correlation function is a second order statistic that 
measures the clustering of a continuous continuous field 
or a point process. For the galaxy field, it measures the 
excess of probability to find a pair of galaxies in volumes 
dVi and AV2 separated by x compared to a random un- 
clustered distribution 

dPis =n[l+e(x)]dy 1 dV 2 (3) 

with n the mean density of points. Due to the cosmo- 
logical principle the correlation function £(r) is isotropic 
so that it only depends on the norm of the separation 
vector r = ||r||. However we do not exactly measure the 
correlation function for two reasons 

• We observe galaxies in redshift space so that there 
are redshift distortions in the line of sight direction 



• The choice of fiducial cosmology dilates the galaxy 
survey differently in the line of sight and transverse 
directions 

As explained later the second effect can be neglected, i.e. 
we can model the effect of a wrong fiducial cosmology 
by a single dilation factor a in all directions. We still 
want to measure the correlation function as a function 
of r = ||r||, so we will consider the monopole in redshift 
space that we denote £(r) (and still refer to it as the 
correlation function as it is done in most studies) 

£W = ^ / £Mdtt (4) 

In the plane parallel approximation and in the linear 
regime on large scales, the monopole correlation function 
in redshift space is linked to the correlation function in 
real space by a constant multiplicative factor indepen- 
dent of scale ((Kaiser 19*861) . 

When considering CDM models, the linear power spec- 
trum can be computed up to an amplitude factor for 
given matter density Li m , baryon density u>j, and spectral 
tilt n s . In our analysis we neglect the effect of w& and 
n s because they are well constrained by WMAP data 
(jKomatsu et al.l I2009f ) . We fix them at the maximum 
likelihood values of WMAP7, uj b = 2.227 x 1(T 2 and 
n s = 0.966 (we will also fix the parameter cr§ = 0.81 for 
normalizing the linear power spectrum). So the only pa- 
rameter of the linear power spectrum that we vary is the 
matter density oj m . 

A prominent feature of the linear correlation function 
is the BAO peak at scale * 150 Mpc, which is due to 
sound waves traveling in the hot plasma before recombi- 
nation, when photons are baryons were coupled together. 
Note however that the BAO peak is not the only effect of 
baryons in the linear correlation function, and that they 
also suppress the amplitude of fluctuations on small and 
intermediate scales. 

Then we have to take into account the non-linear ef- 
fects in the galaxy field. The first effect is due to the 
non-linear evolution of the matter density field, where re- 
cent advances in modeling have b een made using Renor- 
malize d Perturbation Theory (|Crocce &: Scoccimarrol 
(I2006T) RPT) Usi ng RPT, it has been shown in 
iSanchez et "all (|2008l ) that one can have an excellent de- 
scription of the correlation function for the range of scales 
60/i -1 Mpc < r < 180/i _1 Mpc. 

In this study we use a simple model for the non- 
linear evolution of th e matter density fi eld. We use the 
HALOFIT procedure (| Smith et al.ll2003fl . which provides 
corrections for scale-free power spectra using iV-body 
simulations. Because these simulations do not include 
the BAO feature we also have to corr ect for the non- 
linear degradation of the acoustic peak. lEisenstein et al.l 
(|2007f ) found that it is well approximated by a Gaussian 
smoothing of the acoustic feature both in redshift and in 
real space. 

The power spectrum with degraded peak Pdamped.L is 
obtained using the linear power spect rum Pr and the 
linear 'no wiggles' power spectrum of lEisenstein fc Hul 
CESS), Pnowig,L 

Pdamped, L (&) = Pnoviig H~ £ ^ [-^i, (&) ~~ Pnowig,L (&)] 

(5) 
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To take into account the scale- free non-linear effect, we 
apply to the damped power spectrum the same non-linear 
correction as the scale- free power spectrum P n owig,h{k) 

damped, NL\K) ~ —5 TTT" J~ damped, L\K) (0) 



L,nowig 



where PNL,nowi g {k) is comp uted from Pj r,. m m.n[ k) us- 
ing the HALOFIT formula in iSmith et "all {2003). We 
compu te these power spect ra using the iCosmo IDL li- 
brary (Refrcgicr et al. 2011D. 

There remains to set the value of a in formula ([5]) 
and model the scale-dependent galaxy bias with respect 
to the matter density field. For these purposes we use 
the Large Suite of Dark Matter Simulations (LasDamas, 
McBride et al. 2012, in prep.). These simulations are 
designed to model the clustering of the SDSS DR7 for 
galaxies in a wide luminosity range. Galaxies are arti- 
ficially placed in dark mat ter halos using a ha l o occu - 
pation distribution (HOD; iBerlind fc Weinberd f2002)) 
with parameters set to match observations on the SDSS 
sample. 

We use the gamma release of the Las Damas simula- 
tions and more precisely the Oriana simulations that are 
publicly available^ They are composed of 40 TV-body 
simulations, where each simulation can reproduce two 
times the 'North+South' SDSS footprint for a total of 
80 realizations. Each iV-body simulation contains f 280 3 
particles of mass 45.73 x 1Q 10 }i~ 1 Mq with a softening 
parameter of 53/i _1 kpc. The cosmological parameters 
of the simulations are Q, m — 0.25, = 0.75, ilb = 0.04, 
h = 0.7, a$ = 0.8 and n s = 1. 

We use catalogues composed of LRG galaxies with 
M g < -21.2 and M g > -23.2 as the DR7-Full sam- 
ple. As it is nearly volume-limited, the redshift range 
(0.16 < z < 0.36) is smaller than that of the DR7-Full 
sample. However because of a non-evolving HOD model 
to populate dark matter halos, the galaxy number den- 
sity n(z) is slowly decreasing. To address this, we com- 
pute the correlation using the random catalogue provided 
by the Las Damas team, which has the the same decreas- 
ing trend in its density. 

We compute the correlation function using the Landy- 
Szalay estimator of formula (JTTJ) . We average the mea- 
sured correlation function over the 80 realizations so that 
we get a very good approximation of the real correlation 
function. On the other hand, we compute the power 
spectrum as in formula ([6]) using the Las Damas cos- 
mological parameters. We apply the Hankel transform 
to this power spectrum in order to obtain the corre- 
sponding correlation function. First we adjust the pa- 
rameter a of equation (JS|) to reproduce the non-linear 
degradation in the simulations and we find that the value 
a = 9.5h~ 1 Mpc gives a good result. Finally we adjust the 
scale-dependent galaxy bias B(r) on small scales by di- 
viding the Las Damas correlation by our model. We find 
a scale-dependent correction of « 10% at r — 5ft.~ 1 Mpc 
which slowly decreases up to r = 55/i _1 Mpc. 

We thus obtain the galaxy correlation function 

^galaxy ,ui m damped, N L (r) (7) 

where ^damped, nl{t) is obtained by the Hankel trans- 

2 http: //lss .phy . vanderbilt . edu/lasdamas/mocks/ 



form of Pdamped.N h{k) of formula (0 with the choice 
a = 9.5/i~ 1 Mpc in equation ([5]). We keep B(r) and a 
fixed in our analysis, so that ^ ga iaxy,ui m only depends on 
the linear power spectra Pl and P n owig,L of equation (j&J- 
And as we already explained, we only vary the param- 
eter u> m in the linear power spectra. So the correlation 
function ^ ga iaxy,uj m only has a dependence on uj m . 

We introduce two additional parameters in the model 
correlation function. The first parameter a accounts for 
a dilation of the galaxy survey due to an incorrect choice 
of fiducial cosmology to convert redshifts into distances. 
This parameter is actually the one that is probed by the 
localization of the BAO peak and the standard ruler 
property. It was shown that a wrong choice of fidu- 
cial cosmology approximately translates into a dilation 
of the galaxy suryey and thus of the correlation funct ion 
(jEisenstein et al.ll200H IPadmanabhan fc Whitdl200l by 
a factor a = Dv(z e f /) / Dvjid{z e f /) with z e ff = 0.3 the 
effective redshift of our sample, and Dy(z) the 'dilation 
scale' at redshift z 



D v {z) 



D M {zf 



H(z) 



1/3 



(8) 



where H(z) is the Hubble parameter and Dm(z) is the 
comoving angular diameter distance at redshift z. Our 
choice of a flat ACDM fiducial cosmology with il m = 0.25 
gives D V jid(z eff = 0.3) = 1180 Mpc. _ 

Next we introduce a constant amplitude factor b to 
model variations of as, linear redshift distortions and lin- 
ear galaxy bias. So we obtain the final model correlation 
function as a function of u> m , a and B = b 2 



€w m ,a,B(r) = b 2 ^galaxy,uj m {ar) 



(9) 



Finally we bin the model correlation function equiva- 
lently as when it is estimated by pair counting, i.e. for a 
bin [r l - dr/2, r l + dr/2] 



rri+dr/2 ^ , 

Jn-dr/a L m ^, a ,B(r)r 



dr 



f 



n+dr/2 
-dr/2 



.2 



(10) 



In all this study we use a dr = 10ft. 1 Mpc binning from 
20/i~ 1 Mpc to 200/i _1 Mpc corresponding to n = 18 bins. 

3.2. Correlation function estimation 

Most estimators of the correlation function use ran- 
dom unclustered catalogues (i.e. Poisson catalogues with 
no correlation) and compare the excess of pairs of data 
points separated by a distance r compared to pairs of 
random points. Different estimators have been proposed 
and co mpared (jPons-Borderia et al.lll999t lLabatie et al.1 
l2012a| ). The re commendation i s to use either the Hamil- 
ton esti mator ( Hamilton 1993) or the Landy-Szalay es- 
ti mator (lLandv fc Szalavlll993D . They have been shown 
in lLabatie et al.l ()2012aD to have lower variance than the 
other estimators and negligible bias for current galaxy 
surveys. Most studies are using the Landy-Szalay esti- 
mator, and we will also use it here. It is given by 



N RR DD(r) Nrr DR(r) 
N DD RR(r) N DR RR(r) 



(11) 



with DD(r), RR(r), DR(r) the number of pairs at a 
distance in [r ±dr/2] of respectively data-data, random- 
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random, data-random points and Ndd, -Wrr, Ndr the 
total number of corresponding pairs in the catalogues. 

Formula (TTT1) corresponds to the case where all galax- 
ies are weighted equally in the estimator. This is optimal 
for volume-limited surveys but it is not optimal when the 
galaxy mean density depends on redshift. An approxi- 
mately optimal weighting, which depends on the distance 
r at which we estim ate the correlation function, is given 
in Hamilton! (|1993f ) by 



where $j is the selection function at the position of the 
galaxy i, n is the expected density of the catalogue before 
the selection function is applied and J(r) is the integral 
of the real correlation function 

J(r) = f £(s)d 3 s = 4tt f £ (s)s 2 ds (13) 
Jv r Jo 

There is still a constraint not to introduce a bias, which 
is that the weighted density of the random catalogue 
and data catalogue must be proportional (i.e. there can 
only be a multiplicative factor of difference between the 
two). When introducing weights as in formula (fT2|) the 
pair-counting quantities (DD, RR, DR) are modified in 
the Landy-Szalay estimator of equation (fTTj) . Instead of 
adding +1 for each pair, we simply add WiWj, with it;, 
and Wj the weights of each point of the pair. 

When computing the correlation function of the DR7- 
Full sample we do not try to apply such optimal weights. 
We only take care of the fiber collision problem which lo- 
cally changes the density o f galax ies. We apply the same 
weights as in iKazin et al.l ((2010), that upweight groups 
of galaxies which are close enough to be affected by fiber 
collisions. Concerning the angular incompleteness and 
the varying density with redshift, they are taken into ac- 
count in the random catalogue. So overall the weighted 
density in the data and random catalogues are propor- 
tional. 

We use the same random catalogue as in IKazin et al.l 
(2010) which is also available onlin^3 . It is composed 
of pb 1.66 million points, i.e. rs 16 times the number of 
galaxies in the data. 

We plot in figure [3] the measured correlation func- 
tion of the data sample, with a BAO p eak a bit wider 
than expected. This was also found in Martine z et al.l 
(2009) on a S PSS DR7 LRG vo lume-limited sample. 
Yet the study IKazin et al.l (|2010D concludes that this 
is not due to systematics but only to signal variance. 
N ote also that the BAO reco nstruction technique used 
in iPad manabh an et al.l (|2012f) on the same sample leads 
to a sharpening of the BAO peak. However, without 
applying this technique or introducing nuisance parame- 
ters, the wide BAO peak results in a low BAO detection 
level and also a shift towards values a < 1 (see section 

El}. 

A lot of studies on the clustering of the SDSS DR7 LRG 
sample focused only on the position of the BAO peak. 
T his is done e i ther b y using peak finding techniques as 
in lKazin~et al. (2010), or by introducing nuisance param- 
eters for the global shape of the correlation function (or 

3 http : //cosmo . nyu . edu/~ eak306/SDSS-LRG . html 
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Fig. 3 x — Estimated correlation function of the SDSS DR7-Full 
sample £ with a flat ACDM fiducial cosmology with f2 m = 0.25. 
We give the error bars as the diagonal part %/Ca of the covariance 
matrix obtained from 2000 lognormal simulations with parameters 
Mm = 0.13, a = 1 and b = 2.5. The BAO peak is a bit wider 
than e xpected, which is explained by signal variance in K azin et al.1 

power spect rum) which are margi nalized over (e.g. spline 
fu nctions inlPerc i yal et ail (|2010( 1 or inverse polynomials 
in lXu et all (12012m . 

In the latter case, this enables to obtain high BAO 
detection levels, th at we do not manage to obtain here 
otherwise (3.6a i n iPercival et alJ (|2010l ) and 3a before 
reconstruction in IXu et all (|2012f) ) . Therefore we will 
not study the BAO detection here. Another reason is 
that the presence of BAOs in large-scale structures is 
becoming hard to dispute after re cent results from t he 
surveys WiggleZ (3.2a detection inlBlake et al.1 (|2011all l 
6dF (2.4ct detect ion inlBeutler et all (120111) ) and BOSS 
(5cr detection in lAnderson et all (|2012f) l Finally let us 
mention that wavelet analysis also enabled to obtain 
high level of detect ion using SDSS DR7 s amples (A. 4a in 
lArnalte-Mur et al.l (|2012T ) and Aa in lTian et all (|20 111 )1 

So we will focus on cosmological parameter constraints 
using the SDSS DR7-Full sample described in section [51 
Because we use a relatively simple correlation function 
modeling, our study is not meant to improve cosmologi- 
cal parameter constraints. We only attempt to quantify 
the modeling error introduced by the approximation of a 
constant covariance C instead of a model-dependent Co- 

4. LOGNORMAL SIMULATIONS 

In this section we describe our procedure for gener- 
ating lognormal simulations that will provide us with a 
model-dependent covariance matrix Co- In our lognor- 
mal simulations we use the same sky coverage and the 
same number density as in the SDSS DR7-Full sample. 

To generat e lognormal realizatio ns we use the same 
method as in lLabatie et all (|2012all we generate a con- 
tinuous galaxy field in a cube from an input correlation 
function we apply the SDSS DR7-Full selection func- 
tion (which incorporates the angular mask and the num- 
ber density) , and finally we Poisson sample the resulting 
continuous field. 

For computational reasons we do not estimate the cor- 
relation function £ on the full sky, but separately on 
the Northern Galactic Cap, £,ngc an d Southern Galac- 
tic Cap, £,SGC: which can be considered as independent. 
Also for computational reasons we use random catalogues 
with the same density as the SDSS DR7-Full sample. 
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From these measurements we obtain the model- 
dependent covariance matrices Cngc,b, and Csac,9 by 
computing the empirical covariance matrices (as in equa- 
tions ([T]) and ([2])). For each simulation, corresponding to 

a parameter 9, we obtain the full correlation function £ by 
the sa me optimal linear combination as in W hite et al.l 
(|2011D (see appendix [7£} 



Z = Cg 



C NGC,B^NGC + C S GC,eisGC 



Co — I C N Q C e 



CsGC,t 



(14) 
(15) 



with Cg the resulting covariance matrix of the full corre- 
lation £. 

As we stated in section 13711 we only take into account 
3 main parameters in the correlation function, i.e. 9 = 
(u m ,a,B). 

The parameter u) m changes the whole shape of the 
correlation function, so we have no choice but to gen- 
erate different sets of lognormal simulations for differ- 
ent values of u> m . We choose to use 5 values u> m — 
0.08,0.105,0.13,0.155,0.18 and simply interpolate lin- 
early the covariance matrix for intermediate values (more 
precisely, each coefficient of the covariance matrix is lin- 
early interpolated). 

The parameter a, on the other hand, only creates a 
dilation of the galaxy survey and thus of the apparent 
correlation function. This is only a geometrical effect 
due to a wrong fiducial cosmology. It is thus possible to 
take it into account using a single set of simulations. 

First we must take into account that if the survey ex- 
tends from a minimum distance r m i n to a maximum dis- 
tance r max in fiducial coordinates, it extends from a r m i n 
to a r rnax in comoving coordinates. So for a simulation 
parameter a, one must consider cuts at these distances 
ot r m i n and a r max and then artificially dilate the sur- 
vey by a factor a to mimic the effect of a wrong fiducial 
cosmology. 

So instead of producing simulations that extend from 
frnin to r max , we produce simulations that extend from 
c^mm Tmin to OL raax T max , where Oijnin and OL rnax are the 
minimum and maximum values of a considered. In this 
way we are always able to consider cuts at distances 
OLr m i n and ar max . In this study we choose a m i n = 0.8 
and a max = 1.2. Given the value ZV,/id(0.3) = 1180 
Mpc for our fiducial cosmology, we get a probed range 
ZV(0.3) e [944 Mpc, 1416 Mpc]. 

There is another complication because the apparent 
density must be in agreement with the one observed in 
the data catalogue. So in addition to the cuts between 
ol Train and a r max , we introduce a varying selection func- 
tion that depends on a, so that the observed density after 
the dilation by a agrees with the one of the data cata- 
logue. 

We developed an optimized procedure for computing 
the correlation function in this context. First, because 
the correlation function is estimated by pair-counting, 
the estimation can be done in comoving coordinates (i.e. 
before the dilation) and the dilation is only applied after 
the pair-counting by dilating bin ranges. The density in 
comoving space is given by 



a 6 \a/ 



with n(r) the observed density in the data catalogue and 
the factor 1/a 3 accounting for the change of density be- 
cause of the dilation. 

So the original lognormal simulations are generated 
with a density n rnax (r) = m&x a n a (r). Let us define 
the selection function <l> Q (r) = n a (r) / n max (r) . We ap- 
ply this selection function for every value of a in the 
following way: for each galaxy at distance r in the orig- 
inal simulation, we generate a random uniform variable 
u 6 [0,1]. Then the galaxy belongs to the simulation 
with value a if u < $ Q (r). 

For each galaxy Xi we end up with a sequence of in- 
tervals [aj,c^] for which the galaxy belongs to the simu- 
lations. To optimize the computation of the correlation 
function we create a new galaxy at the same position for 
every distinct interval [ai,«i+i]. 

Let us consider only the pair counting term DD, 
with the same argument that could be applied for 
DR and RR. For every r we consider an array 
(DD auraw (r))i = i^.^ n corresponding to the grid a — 
(ai,..., a n ). This counts the number of pairs to add 
from DD ai (r) to obtain DD ai+1 (r). 

For every pair (xk,xi) with a ranges respectively 
equal to [afc,afe'] and [a/, a/<], the pair belongs to the 
simulations for the range [max(afc, ai), min(afc', ap)] = 
)]. So we add +1 to DD a ^ raw (r) for 



0'n 



K(k,l) 



a 



in(fc',Z' 

and add -1 for a = a 



we obtain the a dependent DD a (r) as 



min^',/')-)-! 



In the end 



(17) 



DD ai (r) =J2 DD <*3, raw(r) 
j=o 

Finally we only have to perform the dilation on 
DD ai (r) that was computed in comoving coordinates 



DDl mal (r)=DD a (ar) 



(18) 



(16) 



This whole procedure enables to obtain D D, DR and 
RR for every r and every a with a time increased only by 
a factor w 4 instead of being proportional to the number 
of a values. 

Finally let us turn to the third parameter B = b 2 , 
which changes the real galaxy distribution in comoving 
space, just like Lo m . But because it is simply a con- 
stant multiplicative factor B in the correlation function, 
it should give approximately a factor B 2 in the covari- 
ance matrix of £. We recall that there are two different 
sources of noise in the estimator £ 

• Cosmic variance due to the finite extent of the cat- 
alogue 

• Shot noise due to the finite number of galaxies to 
map an underlying continuous field 

The approximation of a covariance matrix scaling as 
B 2 is valid when we can neglect the shot noise contribu- 
tion compared to the cosmic variance contribution. So 
obviously it is better verified for large values of b. How- 
ever we verify in section 15.31 that it is a good approxi- 
mation around reasonable values of b, with the approx- 
imation B 2 C being much closer to the real covariance 
matrix than the approximation of a constant C . So this 
parameter will actually be treated without any need for 
more simulations. 
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Fig. 4. — Mean estimators £ Wm in dashed lines compared to the 
real correlation function ^ m in solid lines for a = 1 and for u) m = 
0.08 (purple), 0.105 (light blue), 0.13 (green), 0.155 (yellow), 0.18 
(red). 

Our main set of simulations will be performed with 
b = 2.5 (note that this value is with respect to t he real 
space correlation, i.e. without the boost factor of iKaiserl 
(1986)). For each value of (u) m , a) we will use N = 2000 
lognormal simulations to estimate the covariance matrix 

5. RESULTS ON THE STATISTICAL MODELING OF £ 
5.1. Absence of bias in £ 

We first test whether there is a bias affecting the esti- 
mators of the correlation function in our lognormal sim- 
ulations. This is important for cosmological parameter 
constraints because the expected value of £ is assumed 
to be from a given model £g (see section 



(19) 



To verify that the bias is negligible we compute the 
mean of the measured correlation function for a = 1 and 
for the different values u m = 0.08, 0.105, 0.13, 0.155, 0.18, 
using N = 2000 lognormal simulations in each case 



L 



i 



N 
fc=l 



(20) 



We plot in figure [4] the resulting mean estimators £ Wm 
compared to the real correlation function £ Um , which is 
given as the lognormal simulations input. Figure |4] shows 
a very good agreement, i.e. that the estimators are nearly 
unbiased. 

5.2. Verification of the Gaussianity of £ 

Now we want to verify the Gaussianity of the mea- 
sured correlation function £, i.e. again to verify that the 
following hypothesis is realistic 

For this we use the correlation function estimates £ 
on the N = 80 Las Damas realizations presented in sec- 
tion 13.11 Indeed they are more realistic than our log- 
normal realizations. For example the broadening of the 
BAO feature appears through non-linear evolution in the 
Las Damas simulations, whereas it is simply 'injected' 
through the input correlation function in our lognormal 
realizations. 




Fig 



5 10 15 20 25 30 35 

5. — Estimated pdf of x 2 (red) using the histogram on the 



80 Las Damas realizations and pdf of a Xig distribution (black). 
Error bars give the Poisson uncertainty in the estimate due to finite 
number of realizations. 

First we compute the empirical mean and empirical 
covariance matrix of the LasDamas realizations 



1 



N 



fc=l 



(21) 



1 



N 



N 
k=l 



[Un)-an)] [4(r i )-£> i )](22) 



We then compute the \ 2 statistic for each realization 
which should approximately follow a Xn law with n = 
18 if the measurement £ is Gaussian 



c 



1,3 



(23) 

i(ri) (24) 



We show on figure [5] the histogram of % 2 on the 80 Las 
Damas realizations compared to the probability density 
function (pdf) of a \n variable with n = 18. We can see 
the very good agreement between the two distributions. 

5.3. Dependence of Cg on u> m> a and B 

Here we describe the dependence of Cg (obtained from 
our full set of lognormal simulations) with respect to uj m , 
a and B. 

First we check that that the dependence of Cg on 
B = b 2 can actually be approximated as C WmjQ .s oc 
B 2 C bJmi a. For this we compare the covariance matrix 
C\ = C UJmta .B 1 to the covariance matrix Ci = C Um<at B 2 
obtained in each case from N = 2000 lognormal simu- 
lations, respectively with uj m = 0.13, a = 1, B\ = 2.5 2 
and w m = 0.13, a = 1, B 2 = 3.0 2 . 

We compute the L2 distance between (B 2 /Bi) 2 Ci and 
Ca, and compare it to the L2 distance between C\ and 
C 2 

\\(B 2 /B 1 ) 2 C 1 ~C 2 \\ 2 



\C\ — c 2 \\ 2 



= 0.22 



(25) 



So we obtain that the approximation C LOm . a ,B oc 
B 2 C bJm . a is 5 times better than the approximation of 
a constant covariance matrix, which justifies our approx- 
imation. 
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Next we outline the significant dependence of Cg 
with respect to the two other parameters uj m and a. 
We start by analyzing the dependence of Cg with re- 
spect to u! m in the case a = 1 and B = 2.5 2 . We 
show on figure [5] and [7] the variations of Cg for uj m = 
0.08,0.105,0.13,0.155,0.18. For clarity reasons we dis- 
tinguish between the correlation matrix pg (i.e. the co- 
variance matrix normalized by the diagonal elements) of 
formula (1251) and the diagonal part ag — (y/Ce,ii), which 
fully describe the covariance matrix together. 



C, 



e,ij 



y/C e ,iiC e ,jj &e,icrg,3 



Cg, 



(26) 



We recall that the correlation function has n — 18 bins 
of size dr = 10/i _1 Mpc from 20/i~ 1 Mpc to 200/i~ 1 Mpc. 
We find a strong dependence of the whole covariance ma- 
trix with respect to ui m , i.e. both the diagonal part ag 
and the correlation matrix pg have a strong dependence 




Fig. 6. — Dependence of pg with respect to 0J m , in the case a = 1 
and B = 2.5 2 . We plot pg for u) m = 0.08 (top left), 0.105 (top mid- 
dle), 0.13 (top right), 0.155 (bottom left), 0.18 (bottom middle). 
The correlation between bins strongly increases for smaller values 
of OJm. We have plotted the n = 18 bins of size dr = 10h _1 Mpc 
from 20h _1 Mpc to 200/i _1 Mpc. 



b 0.010 



0.005 




100 150 
r (hf'Mpc) 



Fig. 7. — Dependence of ag = (\/Cg t a) with respect to u> 

m , in 

the case a = 1 and B = 2.5 2 . We plot ag for u> m = 0.08 (purple), 
0.105 (light blue), 0.13 (green), 0.155 (yellow), 0.18 (red). The 
diagonal variance strongly increases for smaller values of u> m . 

Finally we analyze the dependence of Cg with re- 



spect to a in the case uj m — 0.13 and B — 2.5 2 . 
We show on figure |8] and [9] the variations of Cg for 
a = 0.8,0.9,1.0,1.1,1.2, again plotting separately the 
correlation matrix pg and the diagonal part ag . We also 
find a dependence of both pg and ag with respect to a 
but this dependence is not as strong as for uj rn . Note that 
this conclusion is dependent on the ranges of parameter 
values, but here we considered pretty standard ranges. 




Fig. 8. — Dependence of pg with respect to a, in the case 
u m = 0.13 and B = 2.5 2 . We plot p g for a = 0.8 (top left), 
0.9 (top middle), 1.0 (top right), 1.1 (bottom left), 1.2 (bottom 
middle). The correlation between bins increases for smaller values 
of a. We have plotted the n = 18 bins of size dr = 10fe _1 Mpc 
from 20/i -1 Mpc to 200/i -1 Mpc. 




Fig. 9. — Dependence of ag with respect to a, in the case uj m = 
0.13 and B = 2.5 2 . We plot ag for a = 0.8 (purple), 0.9 (light 
blue), 1.0 (green), 1.1 (yellow), 1.2 (red). The variance increases 
for smaller values of a. 



6. EFFECT OF Cg FOR COSMOLOGICAL PARAMETER 
CONSTRAINTS 

To obtain cosmological parameter constraints from 
BAOs one usually perform a likeliho od analysis us- 
ing the whole correl ati on fu nc tion jE isenstci n et all 
12001 ISanchez et all 120091 IBeutler et all 120111: 
Blake et al. 2011a. 1)) or power spectrum dCole et all 



1 20051: iTegmark et all l2006t iPachnanabhan et all 1200 .., 
iReid et al l 120101 : IHo et al.M2012D . though some studies 



effectively restrict the analysis to the position of the 
BAO peak dPerciyal et al.ll2007l [20101 iKazin et al.ll2010l: 
iMehta et al.ll2012f) . 
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One supposes that the following hypothesis is true and 
wants to constrain the parameter 9 

To obtain posterior information on 9 one needs a 
Bayesian point of view by assuming a prior p{9). Then 

the posterior of 9 knowing the measurement £ is given 
by the Bayes' theorem 

p(9\£)<xp(9)p(t\9)=p(9)£g(i) (27) 

The combination of the measurement £ with other in- 
dependent experiments can be done inside the prior. For 
example with CMB data the posterior is given by 

p{9 | CMB, i) (xp(9, CMB, |) 

ocp(0,CMB)p(||0,CMB) 
ocp(0|CMB) Ce(i) (28) 

where we used the independence of £ and CMB measure- 
ment. Adding the CMB measurement is thus equivalent 
to using a prior p{9) = p(9 | CMB). 

To constrain 9 only from the measurement £ the ques- 
tion of choosing a prior p{9) can be difficult. In this study 
we take a constant prior p{9), but note that this choice is 
arbitrary. So the posterior is equivalent to the likelihood 

C e {i) oc ICel-^e-ii^c^i-M) (29) 

In all the following we compare the posterior obtained 
using our model-dependent Cg to the posterior obtained 
with constant covariance matrix C = Cg for the particu- 
lar value 6> = (u) m ,a,B) = (0.13, 1.0, 2.5 2 ). We only plot 

the 2D posteriors p(tv m , Dy (0.3) | £) (we recall the simple 
relation a = Dy(0.3)/Dy,/i<i(0.3)), i.e. after marginaliz- 
ing over B = b 2 . 

p(u m ,D v (0.3)\i)= [ p(u m ,D v (0.3),B\£)dB (30) 

JB 

where we will consider the following grid: B G [4.0, 9.0] 
with grid step dB = 0.01, u m G [0.08,0.18] with 
grid step 0.00025 and a G [0.8,1.2] with grid step 
0.001. This grid in a corresponds to a grid ZV(0.3) G 
[944 Mpc, 1416 Mpc] with grid step 1.18 Mpc. 

6.1. Effect ofCg on the SDSS DR7-Fu.ll | 

Here we work with the SDSS DR7-Full estimated cor- 
relation function £ of figure |3] 

We plot in figures [10] and [TTJ the posterior 
p{uj m , -Dy(0.3) | £), respectively for a constant covariance 
matrix C = Cg and for a model-dependent covariance 
matrix Cg. 

First we can notice that the posterior 

p(uj m , Dv{0.3) | £) is less regular and more 'noisy' 
in the case of model-dependent Cg. This can be easily 
explained by the noise in the estimation of Cg. 

We also notice that the 2-dimensional posterior cannot 
be so well approximated by a 2-dimensional Gaussian 
(characterized notably by elliptical contours), especially 
in the case of constant C. We attribute this to the be- 
havior of the model correlation function for high u> m 
and low a (bottom right of figure [T0|). 




900 L , , i , , , i , , , i , , , i , , J 

0.08 0.10 0.12 0.14 0.16 0.18 

Fig. 10. — Posterior p(uj m , Dy(0.3) | £) in the case of constant co- 
variance matrix C = Cg , with 8q = {ui m ,a, B) = (0.13, 1.0, 2. 5 2 ) 
(position of the red cross on the figure), for the SDSS DR7- 
Full measurement £. We plot the la to 5<r confidence re- 
gions with the approximation that p is a 2-dimensional Gaus- 
sian. They correspond respectively to — 21n(p) = — 2ln(p max ) + 
2.29, 6.16, 11.81, 19.32, 28.74 (s ee section 'Confide nce Limits on Es- 
timated Model Parameters' in lPress et alj 1120071 )). We obtain the 
1-dimensional constraints LO m = 0.145 ± 0.016 (10.8% precision) 
and £>v(0.3) = 1104 ± 105 Mpc (9.5% precision). 




900 L , , i , , , i , , , i , , , i , , i_3 

0.08 0.10 0.12 0.14 0.16 0.18 

Fig. 11. — Posterior p(u m , Dy(0.3) | £) in the case of model- 
dependent covariance matrix Cg for the SDSS DR7-Full measure- 
ment £. We obtain the 1-dimcnsional constraints u) m = 0.140 ± 
0.011 (7.9% precision) and D v (0.3) = 1114 ± 74 Mpc (6.7% pre- 
cision). There is a small shift in the position of the posterior's 
maximum and the confidence regions get smaller when considering 
a model-dependent Cg. 

From the 2-dimensional posteriors we compute 1- 
dimensional posteriors on uj m and Z?y(0.3), by marginal- 
izing over the other parameter. Then we compute 1- 
dimensional constraints, that we express as a symmetric 
68% confidence interval (la interval) around the poste- 
rior's maximum. 

In the case of constant covariance matrix C, we obtain 
the constraints u) m = 0.145 ±0.016 (10.8% precision) and 
D v (0.3) = 1104 ± 105 Mpc (9.5% precision). Whereas 
in the case of mo del- dependent covariance matrix Cg , we 
obtain the constraints cj m = 0.140 ± 0.011 (7.9% preci- 
sion) and D v (0.3) = 1114 ± 74 Mpc (6.7% precision). 
In terms of a, this gives respectively the constraints 
a = 0.935 ± 0.089 for constant C and a = 0.944 ± 0.063 
for model-dependent Cg. 

As can be seen when comparing figures [TO] and [TT] the 
modeling error due to the approximation of constant C 
is relatively small. Compared to the size of the la in- 
tervals, the maximum likelihood positions are shifted by 
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respectively 31% for ui m and 10% for a. The la intervals 
also get reduced by respectively 3 1% fo r uj m and 29% for 
a. However we will see in section 16.21 that the reduction 
of the la region is not systematic. 

6.2. Quantifying the effect of Cg on SDSS DRl-Full 
simulations 

The approximation of Cg as a constant C results in a 
modeling error, which potentially depends on the partic- 
ular realization £. So we want to quantify the general 
effect of this approximation on cosmological parameter 
constraints using a lot of realizations £ 

i~Mfa ,C eo ) (31) 

with the choice 6 = (u m ,a,B) = (0.13, 1.0, 2.5 2 ). For 
each realization £ we compute the 2-dimensional poste- 
rior p{uj m , a | £) in the case of constant C and model- 
dependent Cg. 
We look at two particular modeling errors 

• Error on the position of the 1-dimensional poste- 
rior's maxima w™ ax and a max 

• Error on the size of the la intervals a^ and a a 



TABLE 2 



We adopt the following notations 



X, .max _ / max,C max, Ce \ I -C 

C „Ce \ IJO 



Sa max = (a max ' c 



a 



(32) 
(33) 
(34) 
(35) 



We generate 2000 realizations following the model of 
formula (|31l) and look at the different quantities &j™ ax , 
5a Um , <5a max and 5a a , which characterize the modeling 
error due to incorrect covariance matrix for each realiza- 
tion £. Each quantity is divided by the la interval size 
(the statistical uncertainty) in equations (|3"2"1) . ([3"3"|). (fM]) 
and ()35l) in order to compare the modeling error to the 
statistical uncertainty. 

We compute the mean values (5w™ ax ),(<5cr Wm ), {Sa max ) 
and (Sa a ) to investigate a systematic shift in the poste- 
rior's maxima or a systematic reduction of the la in- 
tervals. However we found that these mean values are 
negligible compared to the la interval sizes (~ 2%). 

Next we compute the mean absolute values 
(|<5< ax |),(|^ m |), (|fo max |) and (\Sa a \). (|<5< ax |) 
and (|<5a max |) give the mean modeling error on the 
position of the posterior's maxima compared to the la 
interval sizes. On the other hand, (|&t Ww J) and (|5cr Q |} 
give the mean modeling error on the la interval sizes. 
These absolute values actually correspond to what is 
normally referred as the modeling error (indeed for a 
given realization £, we do not really care about the sign 
of the error but only on its amplitude). We show our 
results in table [2] 

As shown in table [2J there is a mean modeling error of 
21% to 28% for the position of the posterior's maxima 
and 7.5% to 10% for the size of the la intervals. So the 
position of the posterior's maxima is much more affected 
by the modeling error than the la intervals. However 
the error stays quite small compared to the la intervals. 



(I&C"I> 
(|<5a max |) 



21% 
7.5% 
28% 
10% 



NOTES. — Importance of the modeling error compared to the la 
interval size, both for the position of the posterior's maxima and 
for the size of la intervals. We find a mean modeling error which 
is quite small compared to the la interval sizes. 

From table [21 we see that the error on the extremities 
of the la intervals is likely to stay below 21% + 7.5% = 
28.5% for oj m and 28%+ 10% = 38% for a. So a possible 
way to handle the modeling error (though it cannot be 
handled for sure, because it depends on the particular 
realization £) is to multiply the size of la intervals ob- 
tained with a constant covariance matrix C by a factor 
w 1.3 for ui m and « 1.4 for a. In this way, the new la in- 
tervals will very likely cover most of the real la intervals 
(i.e. the ones obtained with a model-dependent Cg). 

Let us illustrate more clearly how the modeling error 
can vary depending on the realization £. On figure [T2l we 
show for each quantity <5u;™ ax , Sa LUm , <5a max and 8a a the 
estimated probability density function (pdf) from their 

histogram on 2000 realizations We clearly see that the 
small modeling error varies depending on the realization 

i 




Fig. 12. — Estimated pdf of 5oj™ ax , Sa Wm , <Sa max and Sa a using 
their histogram on 2000 realizations. Error bars give the Poisson 
uncertainty in the estimate due to finite number of realizations. 

Finally we perform a visual inspection of the 2- 
dimensional posteriors p(u) m , £V(0.3) | £) in both cases 
of constant covariance C and model-dependent Cg . As in 
section [01 we find deviations of the 2-dimensional pos- 
teriors compared to 2-dimensional Gaussians for most 
realizations £. These deviations are located at high uj m 
and low a and they happen both in the case of constant 
C and model-dependent Cg. So they are simply due to 
the behavior of the model correlation function £g in this 
region. 

6.3. Quantifying the effect of Cg for next-generation 

surveys 

Finally we try to quantify this modeling error for next- 
generation surveys. Our procedure is simply to divide 
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the covariance matrices C and Cg by a constant factor c 
with c = 2 and c = 4, and repeat the analysis of section 
6.21 To give an idea of what this represents in terms of 
survey size, we can approximate doubling the survey size 
as a factor 1/2 in the covariance matrix 



C 



1 



-A + U2 



1 



c 



1 



c 



(36) 
(37) 



because the estimated correlation function £12 of survey 
'1+2' is approximately the same as the mean of £1 and 
£2 for large enough surveys. So a factor 1/2 in the co- 
variance matrix is approximately equivalent to doubling 
the survey size, and similarly a factor 1/4 in the covari- 
ance matrix is approximately equivalent to quadrupling 
the survey size. 
Now we generate realizations from the model 



1 



(38) 



The approximate likelihood (with constant covariance 
matrix) and real likelihood (with model-dependent co- 
variance matrix) are now given by 

CfiOcxe-^-^^ 1 ^) (39) 

C^iOoclCgl-^e-^-^ 1 ^) (40) 

We repeat the analysis of table[5]with 2000 realizations 
of formula (|3"5)) for each case c = 2 and c = 4. We show 
the results in table |3l 

TABLE 3 





c = 2 


c = 4 


<i^r x i> 


16% 


13% 


(\Sa Um \) 


6.3% 


5.1% 


(|<5a max |) 


23% 


20% 


(!&«!> 


8.5% 


6.9% 



NOTES. — Importance of the modeling error compared to the la 
intervals size, both for the position of the posterior's maxima and 
for the size of la intervals when dividing C and C'e by factors c = 2 
and c = 4. Again we find a mean modeling error which is quite 
small compared to the la interval sizes. The error is smaller here 
than for the SDSS DR7-Full simulations, and it decreases with the 
survey size. 



From table |3]we find again that there is mean modeling 
error which is quite small compared to the la interval 
sizes. The modeling error mainly affects the position of 
the posterior's maxima. It is smaller here than for the 
SDSS DR7-Full simulations, and it decreases with the 
survey size. 

Our conclusion is that the approximation of Cg as a 
constant C only has a small impact on cosmological pa- 
rameter constraints. As surveys get larger the modeling 
error decreases. Again an approximate way to handle 
this modeling error is to multiply the size of la inter- 
vals by a factor « 1.3. We emphasize that our study is 



not comprehensive and that we only took into account 3 
parameters: 9 — {u m ,a, B). 

This conclusion is a bit surprising since we found a 
strong dependence of Cg on 9. However it is easy to 
see that there is a competing effect at work in the likeli- 
hood. Let us remind the expression of the likelihood for 
a mo del- dependent covariance matrix Cg 



Cg(i) oc \C e \- l ' 2 e-W-^ c ^-M) 



(41) 



For example if we multiply the covariance matrix 
by a constant factor c the terms |Cg| -1 / 2 and 

e~^\^~^ e,G » (£~? 8 )) will have competing effects, decreas- 
ing the overall effect on the likelihood. And we indeed 
verified that the term | C e | — 1 / 2 has an important contri- 
bution in practice (i.e. if it is omitted, one obtains much 
greater changes of the likelihood contours). 

Finally we also perform a visual inspection of the 2- 
dimensional posteriors p(u> mi Dy (0.3) | £) in both cases 
of constant covariance C and model-dependent Cg for 
c = 2 and c = 4. Because the maximum likelihood 
is much closer to the real parameter 9q of formula (|38p 

than in section [6721 (because variations of £ are smaller), 
the region causing deviations to a 2-dimensional Gaus- 
sian is nearly always outside the 2 to 3tr confidence re- 
gion. So we find that realizations £ of formula (|38|) have 
2-dimensional posteriors that can be very well approxi- 
mated by 2-dimensional Gaussians. 

7. CONCLUSIONS 

In this paper we have studied the influence of con- 
sidering a realistic model-dependent covariance matrix 
Cg instead of a constant covariance matrix C of the es- 
timated correlation function £ for cosmological param- 
eter constraints. The main difficulty comes from the 
very long computation time required to estimate such 
a model-dependent covariance matrix Cg. 

We have presented a new method to obtain a realis- 
tic model-dependent Cg in a reasonable time, for a 3- 
dimensional parameter 9 = (w m ,a, b 2 ) using lognormal 
simulations. Compared to a constant covariance matrix, 
the computing time is multiplied by a factor roughly 20. 
We plan to release (as part of a general toolbox on the 
correlation function analysis of galaxy clustering) the dif- 
ferent programs to estimate a model-dependent Cg for 
different survey masks, selection functions and ranges of 
cosmological parameters. 

Our first results concern the statistical modeling of the 
measured correlation function £ 



E0 6 6s.t. Z~M(£ 9 ,C a ) 



(42) 



We verified the absence of bias in our lognormal sim- 
ulations, i.e. that the expected value of measured corre- 
lation function E(£) is indeed equal to the input model 
in our simulations £g. Next we verified the Gaussianity 
of the measurement £ using 80 Las Damas realizations, 
which are more realistic than our lognormal simulations. 
We estimated the probability density function of a \ 2 
statistic on these 80 realizations, and found that it is 
compatible with the expected result for Gaussian real- 
izations. 
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We also studied the dependence of Cg with respect to 
uj m , a and B — b 2 . We found that the effect of the 
amplitude parameter b 2 can be well approximated as a 
constant factor b 4 in the covariance matrix (for b high 
enough, i.e. > 2). For the two other parameters ui m and 
a, we found that their variations affect the whole shape 
of the covariance matrix. However w m has a bigger effect 
than a for usual ranges of parameter values. 

Next we studied the implications of a model-dependent 
Cg for cosmological parameter constraints. More pre- 
cisely, we always compared the results obtained with Cg 
to the results obtained with a constant C — Cg for the 
particular value 9 — (w m ,a, b 2 ) — (0.13, 1.0, 2. 5 2 ). 

For the SDSS DR7-Full sample, we obtained ui m = 
0.145±0.016 (10.8% precision) and L> y (0.3) = 1104±105 
Mpc (9.5% precision) for a constant C, whereas we 
obtained to m = 0.140 ± 0.011 (7.9% precision) and 
L>y(0.3) = 1114 ± 74 Mpc (6.7% precision) for a model- 
dependent Cg. So there is only a small shift in the posi- 
tion of the posterior's maxima, and the la intervals get 
a bit reduced when considering a model-dependent Cg. 

However this effect is not systematic and depends on 
the particular realization £. In other words, approximat- 
ing Cg as a constant C results in a modeling error both 
for the position of the posterior's maxima and for the 
size of the la intervals, which depends on the particular 
realization £. We quantified this modeling error using a 
lot of SDSS DR7-Full simulations 



|~JV(£ flo ,<7 So ) 



For each parameter, u m and Dy{0.3), we studied the 
error in the position of the posterior's maximum and in 
the size of the la interval. We found a mean modeling 
error in the position of the posterior's maxima approx- 
imately equal to 20% to 30% of the la intervals. The 
error in the size of the la intervals is smaller and is ap- 
proximately equal to 10%. 

Finally we did the same analysis for next-generation 
surveys, simply by dividing the covariance matrix by a 



factor c, with c = 2 and c = 4 

We also found a small mean modeling error on the po- 
sition of the posterior's maxima and on the size of the la 
intervals. As the survey gets larger this modeling error 
decreases. More precisely if we multiply the size of the 
SDSS DR7-Full survey by a factor 4, the mean modeling 
error on the position of the posterior's maxima reaches 
w 20% of the la interval size and the mean absolute error 
of the la interval size reaches w 6%. 

So our conclusion is that the modeling error due to 
the approximation of Cg as a constant C is quite small. 
However for a safer analysis (though this modeling error 
cannot be handled for sure) , one can multiply the size of 
la intervals by a factor rj 1.3. 

This conclusion is a bit surprising since we found a 
strong dependence of Cg on 9. However there is a com- 
peting effect at work in the likelihood Cg(£) that tends 
to erase scaling effects. 

Computing Cg with a higher dimensional-parameter 
9 seems very difficult and cannot be addre ssed with 
our pr ocedure yet. The approach proposed in Ku et all 
(|2012|) of a semi-analytic Cg seems very promising in that 
respect. However it requires an ad hoc fitting of some 
parameters. In order to perform this parameter fitting, 
our simulations for a 3-dimensional parameter 9 could 
be very interesting to use. Such an analysis would en- 
able to see whether our conclusions are still correct when 
considering a full set of cosmological parameters. 
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APPENDIX 

OPTIMAL LINEAR COMBINATION OF ESTIMATORS 

In this section we assume that we have two independent and unbiased Gaussian estimators X\ , X 2 (of dimension n) 
of Xq with respective covariance matrices C\ and C 2 

X 1 ~AT(Xo,C 1 ) (Al) 
A 2 ~A/"(Ao,C 2 ) (A2) 

We consider an unbiased estimator X of Xq as a linear combination of X\ and X2 

X = AX X + {Id - A)X 2 (A3) 

with A a square n x n matrix. The resulting covariance matrix is given by 

C = E[XX T ] = AC X A T + {Id - A)C 2 {Id - Af (A4) 

where we used the fact that X\ and X 2 are independent. We will show that the following choice of A gives an 
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extremum of det(C) 

id - a= (cf 1 + cyy 1 cy 

For this we use the following derivatives formulae, with B a symmetric n x n matrix 

9det ^=dct(C)C- 

= ABJ 2j + J jl BA T = ABJ tj + (ABJ ll ) T 



dC 
d{ABA T ) 



with (J* J ')fcj = SikSji. Differentiating det(C) with respect to A, we get 

^=£det(CX7-^ 



/,/ 



dAj 



So it is sufficient to have for all i,j that 



ac 



dC _d{AC 1 A T ) d((Id- A)C 2 {Id- A) T ) 



dAji dAji dAji 

= {Ad - (Id - A)C 2 )J ij + [{Ad - (Id - A)d)J l] ? 
Again it is sufficient to only have AC\ — (Id — A)d = 0, which gives 

A(d + C 2 ) = C 2 

A=d(d + dr 1 

a = c 2 cy (cr 1 + cyy 1 cy 

So we obtain the solution given by equations (|A6|) and equations (|A6[) 

A=(cy + cyy 1 cy 
u-a= (cf 1 + cyy 1 cf 1 

Finally when using this expression of A into equation (IA4|) we get 

c = e[xx t ] = (cf 1 + cyy 1 



(A5) 
(A6) 

(A7) 
(A8) 

(A9) 

(A10) 
(All) 

(A12) 
(A13) 
(A14) 

(A15) 
(A16) 

(A17) 
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